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As more and more network-structured data sets are available, 
the statistical analysis of valued graphs has become common place. 
Looking for a latent structure is one of the many strategies used to 
better understand the behavior of a network. Several methods already 
exist for the binary case. 

We present a model-based strategy to uncover groups of nodes 
in valued graphs. This framework can be used for a wide span of 
parametric random graphs models and allows to include covariates. 
Variational tools allow us to achieve approximate maximum likeli- 
hood estimation of the parameters of these models. We provide a 
simulation study showing that our estimation method performs well 
over a broad range of situations. We apply this method to analyze 
host-parasite interaction networks in forest ecosystems. 

1. Introduction. Data sets presenting a network structure are increas- 
ingly studied in many different domains such as sociology, energy, commu- 
nication, ecology or biology [Albert and Barabasi (2002)]. Statistical tools 
are therefore needed to analyze the structure of these networks, in order to 
understand their properties or behavior. A strong attention has been paid to 
the study of various topological characteristics such as degree distribution, 
clustering coefficient and diameter [see, e.g., Barabasi and Albert (1999), 
Newman, Watts and Strogatz (2002)]. These characteristics are useful to 
describe networks but not sufficient to understand its whole structure. 

A natural and intuitive way to capture an underlying structure is to look 
for groups of edges having similar connection profiles [Getoor and Diehl 
(2004), Newman, Watts and Strogatz (2002)], which is refereed to as com- 
munity detection [Girvan and Newman (2002), Newman (2004)]. This usu- 
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ally turns into an unsupervised classification (or clustering) problem which 
requires efficient estimation algorithms since the data set at hand is ever 
increasing. 

Several attempts at community detection have been proposed in the lit- 
erature: greedy algorithms for community detection [Girvan and Newman 
(2002) and Newman (2004)] and clustering based on spectral analysis of the 
adjacency matrix of a graph [von Luxburg, Belkin and Bousquet (2008)]. 
Greedy algorithms and spectral clustering both assume that communities 
are determined by a strong within connectivity opposed to a low between 
connectivity. This might be true for so-called communities but need not be 
true for other groups of nodes. For example, a group of nodes loosely con- 
nected to each other but highly connected to a specific group of hubs have 
the same connection profile and form a homogeneous group but do not form 
a community. In addition, they do not offer an explicit generative model nor 
a criterion to select the correct number of communities. 

Model-based methods are appealing by contrast: explicit modeling of the 
heterogeneity between nodes gives different groups an intuitive and easy 
to understand interpretation. Several probabilistic models exists for ran- 
dom graphs [see Pattison and Robins (2007) for a complete review], ranging 
from the seminal Erdos-Renyi (ER) model [Erdos and Renyi (1959)] to the 
sophisticated Stochastic Block Model (SBM) [Nowicki and Snijders (2001)]. 
The ER model assumes independent and identically distributed edges which 
entails that all nodes are structurally equivalent and, thus, there is only 
one community, although a big one. The p\ model from Holland and Lein- 
hardt (1981) extended the ER model by assuming independent dyads instead 
of edges, allowing the breakthrough from undirected to directed graphs. 
But again, all nodes are structurally equivalent in the p± model. Fienberg 
and Wasserman (1981) and Fienberg, Meyer and Wasserman (1985) lifted 
these constraints by assuming the nodes are distributed among Q classes 
with different connectivity profiles. In this model, groups are easily inter- 
preted as nodes belonging to the same class. Unfortunately, Fienberg, Meyer 
and Wasserman (1985) assumes class assignments are perfectly well known, 
which rarely happens. The state of the art in terms of graph modeling is 
the SBM, inspired by Lorrain and White (1971) and introduced by Nowicki 
and Snijders (2001), which takes advantage of mixture models and unknown 
latent variables to allow an easy modeling of groups without requiring them 
to be known in advance. 

In the SBM framework, community detection boils down to three cru- 
cial steps: assignment of nodes to groups, estimation of the model param- 
eter and selection of the correct number of groups. Several authors offered 
their method to solve these issues using Bayesian methods. Nowicki and 
Snijders (2001) work with the original SBM model. Hofman and Wiggins 
(2008) work in a highly constrained version of SBM in which heterogeneity 
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is strictly limited to intra- and inter-community connection and thus char- 
acterized by only two parameters, against Q 2 in the unconstrained SBM. 
Airoldi et al. (2008) extend the SBM framework by allowing nodes to ex- 
hibit multiple communities. By contrast, Daudin, Picard and Robin (2008) 
use a frequentist approach to estimate the parameters of the SBM. The 
frequentist approach is less computation intensive than its Bayesian coun- 
terpart, whereas the Bayesian approach is supposed to better account for 
the uncertainty. With the notable exception of Nowicki and Snijders (2001), 
who use MCMC to estimate the model parameter, both lines of work make 
heavy use of variational techniques: either Variational EM [Jaakkola (2000)] 
or Variational Bayes [Attias (2000); Beal and Ghahramani (2003); Xing, 
Jordan and Russell (2003); Winn, Bishop and Jaakkola (2005)]. MCMC 
computational cost is prohibitive, effectively leading to severe size limita- 
tions (around 200 nodes). Furthermore, because of the complex likelihood 
landscape in the SBM, good mixing of the Markov Chain is hard to achieve 
and monitor. Variational approximations, by contrast, replace the likeli- 
hood by a simple surrogate, chosen so that the error is minimal in some 
sense. Frequentist and Bayesian approach then differ only in the use of 
this surrogate likelihood: Bayesians combine it to a prior distribution of 
the parameter (chosen from some suitable distribution), whereas frequen- 
tists use it directly. In all these methods, the number of groups is fixed 
during the estimation procedure and must be selected using some criterion. 
By contrast, Kemp, Griffiths and Tenenbaum (2004) propose an original 
approach where the number of groups changes and is selected during the 
estimation process. Both Bayesian and frequentist estimations approaches 
give the same kind of results: an optimal number of groups and a proba- 
bilistic assignment of nodes to groups, depending on their connection pro- 
file. However, the Bayesian estimation strategy leads to severe constraints 
on the choice of prior and hyperprior distributions. The Daudin, Picard 
and Robin (2008) maximum likelihood approach does not require any prior 
specification and is more efficient than MCMC estimation [Picard et al. 
(2007)]. 

Previous models are all models for binary networks, for which the only 
information is the presence or absence of an edge. Binary information cer- 
tainly describes the topology of a network but is a rather poor description. 
It accounts neither for the intensity of the interaction between two nodes nor 
for the specific features of an edge. The intensity of an edge may typically 
indicate the amount of energy transported from one node to another, the 
number of passengers or the number of common features between two nodes, 
whereas the specific feature of an edge may be the phylogenetic distance be- 
tween its two ending nodes. Many networks, such as power, communication, 
social, ecological or biological networks, are naturally valued and are some- 
how arbitrarily transformed to a binary graph. This transformation some- 
times conceals important results [Tykiakanis, Tscharntke and Lewis (2007)]. 
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Extending binary models and the associated estimation procedures to val- 
ued graphs with specific features allows more complexity, and more relevant 
information with it, to be processed while estimating the structure of the 
network. 

We are motivated by the search of a structure in valued graphs describ- 
ing the similarity between species within an assemblage according to their 
biotic interactions. In ecology, an assemblage is defined as a taxonomically 
related group of species that occurs in the same geographic area [Ricklefs 
and Miller (2000)]. The species composing an assemblage usually interact 
with many species belonging to other assemblages and the nature of these 
interactions is often very diverse (predator-prey interactions, host-parasite 
interactions, mutualistic interactions, competitive interactions). One of the 
questions facing ecologists is to understand what determines with whom 
a species interact. Conventional wisdom is that within an assemblage, two 
closely related species should share more interactions than two evolution- 
ary distant species because the range of interactions of a species is con- 
strained by its physiological, morphological and behavioral attributes. In 
several cases, this conventional wisdom is revealed to be true. Phylogenet- 
ically related plant species have been shown to bear similar pathogens and 
herbivores [Brandle and Brandl (2006); Gilbert and Webb (2007)] and the 
diet's range of predators has been shown to be phylogenetically constrained 
[Cattin et al. (2004)]. This tendency for phylogenetically related species to 
resemble each other is called phylogenetic signal [Blomberg and Garland 
(2002)]. In other cases, no phylogenetic signal was detected [Rezende et al. 
(2007); Vacher, Piou and Desprez-Loustau (2008)]. Selection pressures ex- 
erted by the environment might account for this absence: species have to 
adapt to varying environments to survive, diverging from close relatives in 
their physiology, morphology and behavior, and possibly developing novel 
interactions [Bersier and Kehrli (2008); Cattin et al. (2004)]. The valued 
graphs under study have species as nodes and the number of shared interac- 
tions as edges. We use a mixture model with phylogenetic distance between 
species as covariate to measure the strength of the phylogenetic signal. This 
latter is defined as the decrease in the number of selected groups due to 
the inclusion of the covariate. Two different assemblages are considered. 
The first assemblage is composed of 51 tree species occurring in the French 
forests and the second is composed of 153 parasitic fungal species also oc- 
curring in the French forests. The interactions considered are host-parasite 
interactions. We expect to find a lower phylogenetic signal in the host range 
of parasitic fungal species [Bersier and Kehrli (2008); Rossberg et al. (2006); 
Vacher, Piou and Desprez-Loustau (2008)] than in the vulnerability of tree 
species to parasites [Brandle and Brandl (2006); Gilbert and Webb (2007); 
Vacher, Piou and Desprez-Loustau (2008)]. 
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In this paper we propose an extension to the stochastic block model, intro- 
duced in Fienberg and Wasserman (1981); Fienberg, Meyer and Wasserman 
(1985); Nowicki and Snijders (2001), and the methods of Airoldi and Carley 
(2005) and Daudin, Picard and Robin (2008), that deals with valued graphs 
and accounts for possible covariates. We use a general mixture model de- 
scribing the connection intensities between nodes spread among a certain 
number of classes (Section 2). A variational EM approach to get an optimal, 
in a sense to be defined, approximation of the likelihood is then presented 
in Section 3. In Section 4 we give a general estimation algorithm and derive 
some explicit formulas for the most popular distributions. The quality of the 
estimates is studied on synthetic data in Section 5. Finally, the model is used 
to elucidate the structure of host-parasite interactions in forest ecosystems 
and results are discussed in Section 6. 

2. Mixture model. We now present the general extension of SBM to 
valued graphs and discuss the two particular modelings used for the tree 
species and fungal species interaction networks. 

2.1. Model and notation. 

Nodes. Consider a graph with n nodes, labeled in {1, . . . ,n}. In our model 
the nodes are distributed among Q groups so that each node i is associated 
to a random vector Zj = (Zn, . . . , Ziq), with Zi q being 1 if node i belongs 
to group q and otherwise. The {Zj} are supposed to be independent 
identically distributed observations from a multinomial distribution: 

(2.1) {Zi}ii.i.d.~A4(l;a), 

where a = (a±, . . . , ckq) and Yl q a q = 1. 
Edges. Each edge from a node i to a node j is associated to a random vari- 
able Xij, coding for the strength of the edge. Conditionally to the group 
of each node, or equivalently knowing the {Zj}, the edges are supposed 
to be independent. Knowing group q of node i and group t of node j, 
is distributed as f(-,0 q i) '■= fqi(-), where fg is a probability distribution 
known up to a finite-dimensional parameter 9 q f 

(2.2) ^|*G?,j'G^~/(-,^) :=/«<(■)■ 

Up to a relabeling of the classes, the model is identifiable and completely 
specified by both the mixture proportions a and the connectivity matrix 
9 = {O q e) q: e=i,... ! Q- We denote 7 = (ot,0) the parameter of the model. 
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Directed and undirected graphs. This modeling can be applied to both 
directed and undirected graphs. In the directed version, the variables Xij 
and Xji are supposed to be independent conditionally to the groups to 
which nodes i and j belong. This hypothesis is not always realistic since, for 
example, the traffic from i to j is likely to be correlated to the traffic from j to 
i . A way to account for such a dependency is to consider a undirected graph 
with edges labeled with the bivariate variables {(X^ , Xji)}i<i < j< n . All the 
results presented in this paper are valid for directed graphs. The results for 
undirected graphs can easily be derived and are only briefly mentioned. 

2.2. Modeling the number of shared hosts/parasites. In our tree interac- 
tion network, each edge is valued with the number of common fungal species 
two tree species can host. Our purpose is to understand the structure of this 
network and it is natural to model the counts X^ as Poisson distributed. The 
mixture models aims at explaining the heterogeneity of the X^. However, 
we would also like to account for some factors that are known to be influ- 
ential. In our network, we expect two phylogenetically related tree species i 
and j to share a high number X^ of parasitic species. As such, their average 
number of shared parasitic species E[Ajj] is expected to decrease with their 
phylogenetic distance yij. We consider three alternatives, and compare two 
of them. 

Poisson mixture (PM): In this mixture, we do not account for the covariates 
and X^ only depends on the classes of i and j: 



A q i is then the mean number of common fungal species (or mean inter- 
action) between a tree species from group q and one from group I and 



Poisson regression mixture with inhomogeneous effects (PRMI): In this mix- 
ture, we account for the covariates via a regression model that is specific 
to the classes of i and j: 



where yij is a vector of covariates and 9 q i = (A q £,f3 q g). 
Poisson regression mixture with homogeneous effects (PRMH): In this mix- 
ture, the effect of the covariates does not depend on the classes of % and 
j: 



Xij\i€q,j€e~V(\ qi ). 




Xa\i e 



q,jel~V{\ qi e^) 
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We point out that models PRMI and PRMH have different purposes. In 
PRMI, the link between the covariates and the edges is locally refined within 
each class (q,£), whereas in PRMH, the covariates compete globally with the 
group structure found by PM. In PRMH, the mixture looks for remaining 
structure among the residuals of the regression model. If the structure was 
completely explained by the covariates, the possibly many components found 
using PM would reduce to a single component when using PRMH. To a lesser 
extent, we expect the number of components to be smaller with PRMH than 
with PM if the phylogenetic distance explains part of the structure. As we 
look for structure beyond the one already explained by the covariates, we 
consider only models PM and PRMH. 

The same models are used for the fungal species interaction network. 
In our examples, data consist in counts, but other types of data can be 
handled with similar mixture and/or regression models (see Appendix A.l 
for details). 

3. Likelihood and variational EM. We now address the estimation of 
the parameter 7 = (a, 9). We show that the standard maximum likelihood 
approach cannot be applied to our model and propose an alternative strategy 
relying on variational tools, namely, variational EM. 

3.1. Likelihoods. Let X denote the set of all edges, X = {Xij}ij=i n , 
and Z the set of all indicator variables for nodes, Z = {Zj}$=x n . In the 
mixture model literature [McLahan and Peel (2000)] (X, Z) is referred to as 
the complete data set, while X is referred to as the incomplete data set. The 
conditional independence of the edges knowing Z entails the decomposition 
logP(Z, X) = logP(Z) +logP(X|Z). It then follows from (2.1) and (2.2) that 
the log-likelihood of the complete data set is 

(3.1) logP(Z,X) = Y,Y. Zi ^ a i + Y,J2 Zi « Z it lo %f<it( X ^- 

i q i=£j q,£ 

The likelihood of the incomplete data set can be obtained by summing 
P(Z,X) over all possible Z's: P(X) = ]P Z P(Z,X). This summation involves 
Q n terms and quickly becomes intractable. The popular E-M algorithm 
[Dempster, Laird and Rubin (1977)], widely used in mixture problems, allows 
to maximize logP(X) without explicitly calculating it. The E-step relies 
on the calculation of the conditional distribution of Z given X: P(Z[X). 
Unfortunately, in the case of network data, the strong dependency between 
edges makes this calculation untractable. 

Undirected graphs. The closed formula (3.1) still holds undirected graphs, 
replacing the sum over i 7^ j by a sum over i < j. This is also true for equa- 
tions (3.6) and (4.3) given below. 
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3.2. Variational EM. We propose to use an approximate maximum like- 
lihood strategy based on a variational approach [see Jordan et al. (1999) or 
the tutorial by Jaakkola (2000)]. This strategy is also used in Govaert and 
Nadif (2005) for a biclustering problem. We consider a lower bound of the 
log-likelihood of the incomplete data set 

(3.2) J(R X , 7 ) =logP(X; 7) - KL(R x (-),F(-\X;j)), 

where KL denotes the Kullback-Leibler divergence and i?x stands for some 
distribution on Z. Classical properties of the Kullback-Leibler divergence 
ensure that J has a unique maximum logP(X;7), which is reached for 
i?x(Z) = P(Z|X). In other words, if P(Z|X;7) was tractable, the maxi- 
mization of i7(-RX)7) with respect to 7 would be equivalent to the maxi- 
mization of logP(X;7). In our case, P(Z|X;7) is untractable and we maxi- 
mize l7(-Rx>7) with respect to both i?x and 7. Jaakkola (2000) shows that 
i7(-Rxi 7 ) can be rewritten as 

(3.3) J(Rx, 7) = n(R x ) + fl x(Z) logP(X, Z; 7), 

z 

where H(-) denotes the entropy of a distribution. The last term of (3.3) can 
be deduced from (3.1): 

^i?x(Z)logP(X,Z; 7 ) 
z 

(3.4) 

= ^ EiJ x ( Z iq) lo S a q + Y E/? x (ZiqZjt) lo § fqi( X ij)> 
i q ijtj q,i 

where Pr x denotes the expectation with respect to distribution i?x- Equa- 
tion (3.4) requires only the knowledge of Ei? x (Zj g ) and Kji x (Zi q Zj£) for all 
i,j,q,£. By contrast, %{R-x) requires all order moments of i?x and is un- 
tractable in general. Maximization of v7(-Rx; 7 ) m Rx. can not be achieved 
without some restrictions on i?x- We therefore limit the search to the class 
of completely factorized distributions: 

(3.5) R K (Z)=Y[h(Z i ,Ti) 1 

i 

where h denotes the multinomial distribution and Tj stands for a vector of 
probabilities, = (ra, . . . , T iQ ) (with Y, q T iq = !)• In particular, E Rx (Z iq ) = 
Ti q and E^ x (ZjgZ^) = Ti q Tj£. In addition, the entropy is additive over the 
coordinates for factorized distributions, so that H(Rx) = Yli ^(^("> T «)) = 
-HiYj q T iq^ T iq- Wrapping everything together, 



j (-Rx, 7 ) = - Ti i log n i + Ti i log a i 

i q i q 
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(3.6) 



+ ^2^2 T iqTje log f q£ {Xij ) . 



It is immediate from (3.6) that J7(i?x>7) is tractable for distributions Rx. 
of the form (3.5). The r^'s must be thought of as variational parameters to 
be optimized so that i?x(Z) fits P(Z|X;7) as well as possible; they depend 
on the observed data X. Since Rx is restricted to be of the form (3.5), 
i7(-Rxi7) is a lower bound of logP(X). 

Discussion about tighter bounds. A fully factorized Rx is only one class 
of distributions we can consider. Broader distribution classes should yield 
tighter bound of J{Rx,l)- Unfortunately, for more general distributions, 
the entropy Ti(Rx) may not have a simple expression anymore rendering 
the exact calculation of J{Rx,~i) untractable: better accuracy is achieved 
at the expense of tractability. A solution to this issue is Bethe free energy 
[Yedidia, Freeman and Weiss (2005)]. We did not consider it because it relies 
on an approximation of H(Rx) which disrupts the well-behaved properties 



Another approach comes from Leisink and Kappen (2001) and Mariadas- 
sou (2006). Starting from an exponential inequality, they emphasize the 
strong connection between fully factorized Rx. and first order linear approx- 
imation of the exponential function. Using a higher approximation of the 
exponential and some distribution Sx in addition to Rx, it is possible to 
derive an even tighter bound of J7"(-Rxi7)- However, the estimation algo- 
rithm is then of complexity C(n 6 Q 6 ) instead of 0(n 2 Q 2 ) for a gain which 
has the same order of magnitude as the computer numerical precision. 

4. Parameter estimation. We present here the two-steps algorithm used 
for the parameter estimation. 

4.1. Estimation algorithm. As explained in Section 3.2, the maximum 
likelihood estimator of 7 is 



In the variational framework, we restrict the last optimization problem to 
factorized distributions. The estimate we propose is hence 



The simultaneous optimization with respect to both Rx and 7 is still too 

(n) 

difficult, so we adopt the following iterative strategy. Denoting by R^ and 



of J{Rx,l)- 




7 = arg max 
7 



max 

-Rx factorized 



J(i?x,7)- 
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7^™) the estimates after n steps, we compute 

r R^ +1) = argmax J(Rx,J (n) ), 

-|\ ^ -Rx factorized 

) 7( n+1 ) = argmax l 7( J R x l+1) ,7). 

7 

The next two sections are dedicated to each of these steps. 

Initialization step. The optimization procedure (4.1) only ensures the 
convergence toward a local optimum, so the choice of the starting point for 
7 or Rx is crucial to avoid local optima. This choice is difficult, but, to 
our experience, hierarchical clustering seems to be a good strategy to get an 
initial value for Rx- 

4.2. Optimal approximate conditional distribution Rx- We consider here 
the optimization of J with respect to Rx- For a given value of 7, we 
denote r the variational parameter defining the distribution Rx = 
argmax Rx factorized^ (Rx., 7)- This amounts to maximimizing J(Rx-,l), 
given in (3.6), under the condition that, for all z, the T{ q 's must sum to 
1. The derivative of J(Rx,l) with respect to Tj g is 

- logr ig - 1 + \oga q + ^2^Tji[log fgg(Xij) + log fg q (Xji)] + Li, 

where Lj denotes the ith Lagrange multiplier. It results from the previous 
equation that the optimal variational parameter r satisfies the fixed point 
relation 

(4-2) ? iq a a, [] UlfA^M^ji)}^ • 

The fixed point relation (4.2) can be related to a mean field approxima- 
tion [see Jaakkola (2000)]. We get r simply by iterating this relation until 
convergence. 

Undirected graphs. For a undirected graph, r satisfies 

& 1 

4.3. Parameter estimates. We now have to maximize J with respect to 
7 = (ex., 6) for a given distribution Rx- Again, this amounts to maximizing 
<J(Rx,l)-, given in (3.6), under the condition that ^2 q a q = 1. Straightfor- 
ward calculations show that the optimal a and 9 are given by 

(4.3) Sq = -^Tj g , 9 q £ = arg max ^ T iq T je log f(Xjj;6). 
n .,. 
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Poisson models. Poisson models are of particular interest for our inter- 
action networks. The optimal X q £ for model PM presented in Section 2.2 is 
straightforward: 

For models PRMH and PRMI presented in the same section, there is no 
closed formula for \ q i, f3 q i or (3. However, since the Poisson regression model 
belongs to the exponential family, J is only a weighted version of the log- 
likelihoods of the corresponding generalized linear model. As such, standard 
optimization procedures can be used. 

Exponential family. The optimal 6 is not explicit in the general case, but 
has a simpler form if the distribution / belongs to the exponential family. 
Namely, if / belongs to an exponential family with natural parameter 9, 

f(x;9) = exp[*(x)'9-A(9)]. 

According to (4.3), we look for 9 = argmaxg Yli^j T iq T ji^(Xij)'9 — A{9). 
Maximizing this quantity in 9 yields 

Y,T iq T ji *(X ij )-VA(9) = 0. 

If VA is invertible, the optimal 9 is 
(4.4) 9 = (VA)~ 

4.4. Choice of the number of groups. In practice, the number of groups 
is unknown and should be estimated. Many criterion have been proposed to 
select the dimensionality Q of the latent space, ranging from AIC to ICL. 
AIC, BIC and their variants [Burnham and Anderson (1998)] are based 
on computing the likelihood of the observed data P(X|?tiq) and penalizing 
it with some function of Q. But the use of variational EM is precisely to 
avoid computation of P(X|?tt,q), which is untractable. Given a prior dis- 
tribution P(m<g) over models, and a prior distribution P(7|mg) for each 
model, variational Bayes [Beal and Ghahramani (2003)] works by selecting 
the model with maximum posterior P(ttiq|X). Estimation of P(X|mg) is 
then performed using variational EM and no penalization is required, as 
complex models are already penalized by diffuse prior P(7|mQ). Extension 
of Deviance Information Criterion (DIC) to finite mixture distributions via 
variational approximations [McGrory and Titterington (2007)] is even more 
straightforward: choosing Q* larger than the expected number of compo- 
nents and running the algorithm, extraneous classes become void as the 
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algorithm converges and the selected number of groups is just the num- 
ber of nonempty classes. In the context of unknown assignments, Biernacki, 
Celeux and Govaert (2000) proposed the Integrated Classification Likeli- 
hood (ICL), which is an approximation to the complete data likelihood 
P(X,Z|toq). Variational Bayes, BIC and ICL can all be seen as approx- 
imations to Bayes factors. Whereas Variational Bayes integrates out the 
uncertainty about the parameter and the assignment of nodes to groups, 
ICL replaces them by a point estimate, computed thanks to variational EM. 
Traditional model selection essentially involves a trade-off between goodness 
of fit and model complexity, whereas ICL values both goodness of fit and 
classification sharpness. 

Nowicki and Snijders (2001) do not propose any criterion to select the 
number of groups. Hofman and Wiggins (2008) use McGrory's method but 
in a very specific case of the Stochastic Block Model. They also give no clue 
as to how to decide that the algorithm has converged enough. Airoldi et al. 
(2008) use either a modification to BIC (for small size networks) or cross- 
validation (for large size networks) to select the number of groups. Daudin, 
Picard and Robin (2008) use a modification to ICL criterion. Following along 
the same line as Daudin, Picard and Robin (2008), we use a modification of 
ICL adapted to valued graphs to select the number of classes. 

ICL criterion: For a model tuq with Q classes where 9 involves Pq inde- 
pendent parameters, the ICL criterion is 

ICL(m Q ) = maxlogP(X,Z| 7 ,m Q ) 

7 

- 1 -{P Q log[n(n - 1)] - (Q - 1) log(n)}, 

where the missing data Z are replaced by their prediction Z. 

Note that the penalty term — \{Pq log[n(n — 1)] — (Q — 1) log(n)} is similar 
to the one of BIC, where the log term refers to number of data. In the case of 
graphs, the number of data is n (i.e., the number of nodes) for the vector of 
proportions a (Q — 1 independent parameters), whereas it is n(n — 1) (i.e., 
the number of edges) for parameter 6 (Pq independent parameters). For the 
models PM, PRMI and PRMH (detailed in Section 2.2), P Q is respectively 
Q(Q + l)/2, Q(Q + 1) and l + Q(Q + l)/2. 
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5. Simulation study. 



5.1. Quality of the estimates. 



Simulation parameters. We considered undirected networks of size n = 
100 and 500 with Q = 3 classes. To study balanced and unbalanced propor- 
tions, we set a q oc a q , with a = 1,0.5,0.2. a = 1 gives uniform proportions, 
while a = 0.2 gives very unbalanced proportions: a = (80.6%, 16.1%, 3.3%). 
We finally considered symmetric connection intensities X pq , setting X pp = X' 
for all p and X pq = A'7 for p ^ q. Parameter 7 controls the difference be- 
tween within class and between class connection intensities (7 = 0.1, 0.5, 
0.9, 1.5), while A' is set so that the mean connection intensity A (A = 2,5) 
depends neither on 7 nor a. 7 close to one makes the distinction between 
the classes difficult. 7 larger than one makes the within class connectivities 
less intense than the between ones. We expect the fitting to be rather easy 
for the combination {n = 500, a = 1, A = 5, 7 = 0.1} and rather difficult for 
{n = 100, a = 0.2, A = 2, 7 = 0.9}. 



Simulations and computations. For each combination of the parameters, 
we simulated S = 100 random graphs according to the corresponding mix- 
ture model. We fitted the parameters using the algorithm described in Sec- 
tion 4. To solve the identifiability problem of the classes, we systemati- 
cally ordered them in descending estimated proportion order: a\ > a.2 > S?3- 
For each parameter, we calculated the estimated Root Mean Squared Error 
(RMSE): 



RMSE(a p ) 



RMSE(X pq ) 



1 S 



^ g Z-A pq p i' ' 



where the superscript (s) labels the estimates obtained in simulation s. We 
also calculated the mean posterior entropy 



H 



Wi„ T W 

iq 



% q 



which gives us the degree of uncertainty of the classification. 
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1 2 3 1 2 3 1 2 3 



Fig. 1. RMSE of the estimates a q . The x-axis refers to Qi,a2,«3- Top: n — 100, bottom: 
n = 500, from left to right: a = 1,0.5,0.2. Solid line: A = 5, dashed line: A = 2. Symbols 
depend on 7; o = 0.1, V = 0.5, A = 0.9, * = 1.5. 

Results. Figure 1 (resp. 2) gives the RMSE for the proportion a q (resp. 
connection intensities X pq ). As expected, the RMSE is lower when n is larger. 
The parameters affecting the RMSE are mainly a and 7, whereas A has 
nearly no effect. The departures observed for ol\ and a% in the balanced 
case (a = 1.0) are due to the systematic reordering of the proportions. 

Since the graph is undirected, X pq = X qp , so only nonredundant parameters 
are considered in Figure 2. The overall quality of the estimates is satisfying, 
especially for the diagonal terms X qq . The within intensity parameter of the 
smallest class A33 is the most difficult to estimate. The worst case corre- 
sponds to a small graph (n = 100) with very unbalanced classes (a = 0.2) 
for parameter A12. In this case, the algorithm is unable to distinguish the 
two larger classes (1 and 2), so that the estimates extra-diagonal term A12 
is close to the diagonal ones An and A22, whereas its true value is up to ten 
times smaller. 

Figure 3 gives the mean entropy. Not surprisingly, the most influential pa- 
rameter is 7: when 7 is close to 1, the classes are almost indistinguishable. 
For small graphs (n = 100), the mean intensity A has almost no effect. Be- 



LATENT STRUCTURE IN VALUED GRAPHS 



15 




246 246 246 




Fig. 2. RMSE of the estimates X pq . The x-axis refers to An, A22, A33, A12, A13, A23. Same 
legend as Figure 1. 

cause of the identifiability problem already mentioned, we did not consider 
the classification error rate. 

5.2. Model selection. We considered a undirected graph of size n = 50, 
100, 500 and 1000 with Q* = 3 classes. We considered the combination {a = 
0.5, A = 2,7 = 0.5} which turned out to be a medium case (see Section 5.1) 
and computed ICL for Q ranging from 1 to 10 (from 1 to 5 for n = 1000) 
before selecting the Q maximizing ICL. We repeated this for S = 100 simu- 
lations. 

Figure 4 gives ICL as a function of Q, while Table 1 returns the frequency 
with which each Q is selected. As soon as n is larger than 100, ICL almost 
always selects the correct number of classes; for smaller graphs (n = 50), it 
tends to underestimate it. The proposed criterion is thus highly efficient. 

6. Uncovering the structure of host parasite interactions in forest ecosys- 
tems. Here we use mixture models to highlight the factors governing with 
whom a species interact in an ecosystem. The factors which may account for 
species interactions are introduced as covariates in the mixture models. The 
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Fig. 3. Mean (normalized) entropy H/n as a function of 7. Top: n = 100, bottom: 
n = 500, from left to right: a — 1, 0.5, 0.2. Solid line: A = 5, dashed line: A = 2. 



explanatory power of each factor is measured as the decrease in the num- 
ber of groups selected. Our study focuses on host-parasite interactions in 
forest ecosystems. We address the two following questions: (1) Is similarity 
in the parasite assemblages of two tree species explained by their phyloge- 
netic relatedness rather than by the degree of overlap of their distributional 
range? (2) Is similarity in the host range of two parasitic fungal species 
explained by their phylogenetic relatedness rather than their common nutri- 
tional strategy? The explanatory power of phylogenetic relatedness is sub- 
sequently called phylogenetic signal, as in the ecological literature [Rezende 
et al. (2007); Vacher, Piou and Desprez-Loustau (2008)]. 

6.1. Data. 

Host-parasite interaction records. We considered two undirected, valued 
networks having parasitic fungal species (n = 154) and tree species (n = 
51) as nodes, respectively. Edges strength was defined as the number of 
shared host species and the number of shared parasitic species, respectively 
[Mariadassou, Robin and Vacher (2010)]. 
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The methods used for collecting data on tree-fungus interactions are fully 
described in Vacher, Piou and Desprez-Loustau (2008). Fungal species names 
were checked since then in the Index Fungorum database (www.indexfungorum. 
org): 17 names were updated, yielding to 3 new species synonymies. The fu- 
sion of synonym species accounts for the lower number of fungal species in 
the present study than in the original publication. 



Table 1 

Frequency (in %) at which Q is selected for various sizes n 



n 



Q 50 100 500 1000 



2 82 7 

3 17 90 100 100 

4 13 
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Phylogenetic relatedness between species. In order to verify the exis- 
tence of a phylogenetic signal in the parasite assemblages of tree species, 
we estimated genetic distances between all pairs of tree species. The max- 
imally resolved seed plant tree of the software Phylomatic2 [Webb and 
Donoghue (2005)] was used to produce a phylogenetic tree for the 51 tree 
species included in our study. Then, pairwise genetic distances (in million 
years) were extracted by using the cophenetic.phylo function of the R ape 
package [Paradis, Claude and Strimmer (2004)]. Because the phylogenetic 
tree was loosely resolved for gymnosperms, we also used taxonomic dis- 
tances to estimate phylogenetic relatedness between tree species. Since all 
tree species included in the study belong to the phylum Streptophyta, we 
used the finer taxonomic ranks of class, order, family and genus to calcu- 
late pairwise taxonomic distances. Based on the NCBI Taxonomy Browser 
(www.ncbi.nlm.nih.gov/Taxonomy/), we found that the species are evenly 
distributed into two taxonomic classes (Magnoliophyta and Conipherophyta) 
and further subdivided in 8 orders, 13 families and 26 genera. Following 
Poulin (2005), we considered that the taxonomic distance is equal to if 
species are the same, 1 if they belong to the same genus, 2 to the same 
family, 3 to the same order, 4 to the same taxonomic class and 5 if their 
only common point lies in belonging to the phylum Streptophyta. 

In order to investigate the existence of a phylogenetic signal in the host 
range of parasitic fungal species, we estimated taxonomic distances between 
all pairs of fungal species. Pairwise genetic distances could not be calculated 
because genetic data were not available for all the species. Since the 153 
fungal species at hand span a wider portion of the tree of life than the tree 
species, we had to use the higher order rank of kingdom. The taxonomic 
distance for fungal species thus ranges from to 6 (kingdom level) when 
compared to to 5 for trees. The taxonomy was retrieved from Index Fun- 
gorum (www.indexfungorum.org). All fungal species included in the study 
belong to the Fungi kingdom, are divided in two phyla (Ascomycota and 
Basidiomycota) and further subdivided in 9 taxonomic classes, 21 orders, 
48 families and 107 genera. When pairs included a species whose taxonomic 
is uncertain for a given taxonomic rank, this rank was skipped and upper 
ranks were used to estimate distance. 

Other explanatory factors. Other factors than phylogenetic relatedness 
may account for pairwise similarities in parasite assemblages between tree 
species. In particular, two tree species having overlapping distributional 
range are exposed to similar pools of parasitic species and may therefore 
share more parasitic species than two tree species with nonoverlapping dis- 
tributions [Brandle and Brandl (2006)]. We tested this hypothesis by cal- 
culating the geographical distance between all pairs of tree species. The 
geographical distance is the Jaccard distance [Jaccard (1901)] computed on 
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the profiles of presence/absence in 309 geographical units covering the entire 
French territory. 

In the case of fungal species, other factors may also account for simi- 
larity in host range. Here we investigated whether fungal species having 
similar nutritional strategies also have similar host ranges. Fungal species 
were classified into ten nutritional strategies based on their parasitic lifestyle 
(biotroph or necrotroph) and on the plant organs and tissues attacked. Five 
strategies (strict foliar necrotroph parasites, canker agents, stem decay fungi, 
obligate biotroph parasites and root decay fungi) accounted for 87% of the 
fungal species. We considered that nutritional distance between two species 
equals one if the strategies are the same and otherwise. 

6.2. Identification of groups of species sharing similar interactions. 

Model. For both networks, we used the mixture model to define groups 
of tree species and fungal species having similar interactions We assumed 
that, in each network, the edge intensities were Poisson distributed. For 

Table 2 

Top: Size, mean number of interactions and Magnoliophyta content 
for each group found with PM. Bottom: Parameter estimates for the 
tree network: \ q e = mean number of shared parasitic species, ct q = 
group proportion (%) with PM (no covanate) 
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both networks, we considered the PM and PRMH models (see Section 2.2) 
using pairwise distance between species (genetic, taxonomic, geographic or 
nutritional) as a covariate. 

PM model: No covariate. In the absence of covariates, the ICL criterion 
selected 7 groups of tree species. Two groups of tree species (T2 and T5) were 
exclusively composed of species belonging to the Magnoliophyta, whereas 
three other groups (Tl, T3 and T4) were exclusively composed of species 
belonging to the Conipherophyta. The two last groups (T6 and T7) were 
mixed (Table 2). According to the mean number of interactions per species 
and the parameters estimates of the model (Table 2), they were composed 
of tree species having few parasitic species and sharing few of them with 
other tree species. 

It is noteworthy that group T2 was composed of four species belonging 
to the same order (Fagales) and also to the same family (Fagaceae). Groups 
Tl, T3 and T4 were also composed of species belonging to the same family 
(Pinaceae) since the only three coniferous species belonging to another fam- 
ily were classified in groups T6 and T7. These results confirm that two plant 
species with a similar evolutionary history are likely to share the same set 
of parasitic species [Brandle and Brandl (2006), Gilbert and Webb (2007), 
Vacher, Piou and Desprez-Loustau (2008)]. 

PRMH model: Accounting for phylogenetic relatedness. When account- 
ing for taxonomy, ICL selected only 4 groups of tree species. The estimated 
regression coefficient was /3 = —0.317, which means that, for the mean tax- 
onomic distance y = 3.82, the mean connexion intensity is reduced of 70% 
(ePy = 0.298). The cross classification table (Table 3) shows that the taxo- 
nomic distance reduces the number of class by merging groups Tl and T2 
with most of the trees of T4 and T5. T'3 essentially consists of T6, TT of 
T7 and T'2 is made of trees from T3 completed with leftovers from other 
classes. Interestingly and unlike the groups obtained with no covariates, no 
group has species belonging exclusively to one or the other of the taxonomic 
classes (Magnoliophyta or Conipherophyta): the association between group 
of trees and taxonomy was cropped out by the covariate (Table 4). The same 
results hold when using the genetic distance as a covariate instead of the 
taxonomic distance (results not shown). 

Therefore, the inclusion of taxonomic (or genetic) distance as a covariate 
shows that the phylogenetic relatedness between tree species accounts for 
a large part of the structure of tree-parasitic fungus interactions in forest 
ecosystems, but not for all the structure. Indeed, even after controlling for 
the evolutionary history through the taxonomic (or genetic) distance, ICL 
still finds 4 groups of trees, whereas we would expect only one group if the 
phylogeny was the sole source of structure. Below we investigate whether the 
distributional overlap between tree species is another source of structure. 
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PRMH model: Accounting for distributional overlap. In contrast with the 
taxonomic and genetic distance, the geographical distance between species 
does not reduce the number of groups (not shown). This result suggests that 
the current distributional overlap between tree species does not account 



Table 3 

Cross classification of the groups of tree selected found by 
PM and PRMH (with taxonomic variate as a covariate) 
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Table 4 

Top: Size, mean number of interactions (scaled by x3) and 
Magnoliophyta content for each group found with PRMH. Bottom: 
Parameter estimates for the tree network: \ q i — mean number of 
shared parasitic species, a q = group proportion (%) with PRMH (with 
covariate), j3 = covariate regression coefficient 
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for the similarity in their parasite assemblages. This result is opposite to 
the conventional wisdom in the field of community ecology, which favors 
ecological processes, taking place over short time scale, over evolutionary 
processes, taking place over longer time scales, as the main source of biotic 
interaction diversity. Our findings point out that the relative importance of 
these processes might be the other way round. 

6.3. Factors accounting for the host ranges of parasitic fungal species. 

PM model: No covariate. The ICL criterion selected 9 groups of parasitic 
fungal species. The estimates intensities X q i range from almost zero (1.4 x 
10 -3 ) to 12.1, while the group proportions a q range from 1.3% to 40.2% 
(Table 7). 

PRMH model: Accounting for phylogenetic relatedness. Accounting for 
taxonomic distance does not reduce the number of groups (not shown) , indi- 
cating a lack of phylogenetic signal in the host range of fungal species. These 
results parallel those obtained with another clustering approach [Newman 
(2004)] for the same tree-fungus network [Vacher, Piou and Desprez-Loustau 
(2008)]. They are congruent with the results obtained for other bipartite 
networks since asymmetries in the phylogenetic signal have been found in 
numerous plant-animal mutualistic networks [Rezende et al. (2007)] and in 
a host-parasite network between leaf-miner moths and parasitoid insects 
[Ives and Godfray (2006)]. In the latter case, the authors also observed a 
lack of signal through the parasite phylogeny. In the case of the tree-fungus 
network, we proposed that the very early divergence of the major fungal 
phyla may account for the asymmetric influence of past evolutionary his- 
tory [Vacher, Piou and Desprez-Loustau (2008)]: the lack of signal through 
the fungal phylogeny may be the result of parasitic fungal species splitting 
into two groups when the Conipherophyta and the Magnoliophyta diverged 
(both groups containing Ascomycota and Basidiomycota species) and the 
subsequent coevolution of each set of fungal species with its plant phylum. 
Stronger selection pressures on parasitic species than on host species might 
also account for the asymmetry of the signal [Bersier and Kehrli (2008); 
Rossberg et al. (2006)]. 

PRMH model: Accounting for nutritional strategies. Fungal Correlation 
analysis showed an association between the 9 groups selected with the PM 
model and the nutritional type. In particular, two groups of fungal species 
(F2 and F3, see Appendix A. 3) contained a high proportion of root decay 
fungi (100% and 75%, respectively). However, taking the nutritional strategy 
as a covariate does not reduce the number of groups, indicating the lack of 
'nutritional signal' in the host range of parasitic fungal species. 
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Fig. 5. Left: Observed versus predicted graph of the weighted degree Ki of node i 
(R 2 = 0.94 ). Right: Observed versus predicted graph of single edge values Xij (R 2 = 0.56 ). 
Black: regression line; red: Poisson 95% confidence interval. 

6.4. Goodness of fit. Since no covariate decreases the number of mixture 
components in the fungus interaction network, we assessed goodness of fit 
only for the tree interaction network. The goodness is assessed in two ways: 
in terms of likelihood with the ICL criterion and in terms of predictive power 
for the strength of an interaction. The ICL criterion is —2876.6 for the base 
model with no class. It jumps to —1565.6 (AICL = 1212.8) when allowing 
a mixture structure (with 7 classes). It jumps again to —1449.6 (AICL = 
116) when adding the taxonomic distance as a covariate in the model (with 
4 classes). Interestingly, adding a covariate to the 4 class mixture model 
provides a gain in goodness of fit twice as big as the gain of adding three 
additional classes (AICL = 214.2 against 98.2). But adding a covariate only 
requires one additional parameter (/?), against 21 for the three additional 
classes. 

We also assessed goodness of fit in terms of predictive power. For the 
PRMH model with 4 classes and taxonomic distance as a covariate, we 
can predict both the weighted degree Ki = YljM Xij of node i as Ki = 

YlfjYlfq i T iq T jl\l e ^ yij an d the value of a single edge fungal as Xij = 

Table 5 

Tree interaction network. Effect of different factors on the similarity in parasite 
assemblages between tree species. AICL is the gain (in log-likelihood units) obtained when 
switching from the best PM model to the best PRMH model for a given covariate 
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Nb. groups (PRMH) 
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-8.6 


overlap 
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J2 q e T ig T jt^qi e ^ Tyij • The prediction of Ki using Ki is pretty accurate (Fig- 
ure 5 left, R 2 = 0.94). The prediction of Ay using Xy is less accurate, but 
the confidence region is still pretty good (Figure 5 right, R 2 = 0.56). 

6.5. Conclusion. The structure of host-parasite interactions in forest 
ecosystems is a complex one. Some tree species share more parasites than 
others and this variability is well captured by a mixture model. However and 
as shown in Table 5, the naive mixture model deceptively captures part of 
the variability readily explained by other factors, such as the phylogenetic 
relatedness (measured either by taxonomic or genetic distance) and artifi- 
cially increases the number of groups in the mixture. Accounting for relevant 
factors decreases the number of groups selected. Using group reduction as a 
yardstick (Table 5), we conclude that similarity in the parasite assemblages 
of tree species is explained by their phylogenetic relatedness rather than their 
distributional overlap, indicating the importance of evolutionary processes 
for explaining the current patterns of inter-specific interactions. Our study 
is however inconclusive on the relative contribution of phylogenetic related- 
ness and nutritional strategy to the similarity in the host ranges of parasitic 
fungal species parasites of two parasitic fungus (Table 8 in Appendix A. 3). 
In either case, since the PRMH model still finds 4 (resp. 9) classes for the 
tree species (resp. fungal species) interaction network, a significant fraction 
of the variability remains unexplained by our predictors. 

APPENDIX 

A.l. Other mixture models. We examine here some other classical dis- 
tributions which can be used in our framework. 

Bernoulli. In some situations such as co-authorship or social networks, the 
only available information is the presence or absence of the edge. Xij is 
then supposed to be Bernoulli distributed: 

Xij\i e q,j e£~B(TT q t). 

It is equivalent to the stochastic block model of Nowicki and Snijders 
(2001) or Daudin, Picard and Robin (2008). 
Multinomial. In a social network, X^ may specify the nature of the rela- 
tionship: colleague, family, friend, etc. The Xy's can then be modeled by 
multinomial variables: 

Xij\i eq,j €£~M(l;p q t). 

The parameter 9 g £ to estimate is the vector of probability p q £ = (p^ e , . . . ,p^), 
m being the number of possible labels. 

In directed random graphs, this setting allows to account for some depen- 
dency between symmetric edges Ay and Xji. We only need to consider 
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the equivalent undirected graphs where edge is labeled with the cou- 
ple (Xij,Xji). m = 4 different labels can the be observed: (0, 0) if no edge 
exists, (1, 0) for i — > j, (1, 1) for il—j and (1, 1) for i j. 
Gaussian. Traffic networks describe the intensity of the traffic between nodes. 
The airport network is a typical example where the edges are valued ac- 
cording to the number of passengers traveling from airport i to airport j. 
The intensity X^ of the traffic can be assumed to be Gaussian: 

X i:j \i eq,j Af(n q e, (r 2 ql ), 6 qt = (^, a 2 e ). 

Bivariate Gaussian. The correlation between symmetric edges X^ and Xji 
can be accounted for, considering the undirected valued graph where edge 
is valued by (Xij,Xji), which is assumed to be Gaussian. Denoting 

Xij\i € q,j El~N(iJ, qi ,'E q £), 9 q e = (n qi ,'E q £). 

Linear regression. When covariates are available, the linear model, either 
Gaussian for real valued edges or generalized for integer valued (e.g., 
Poisson or Bernoulli) allows to include them. For example, for Gaussian 
valued edges, denoting yjj the p x 1 vector of covariates describing edge 
we set 

Xij\i € q,j € i^M{fi\ v y lh a 2 qi ). 

Simple linear regression. A case of specific interest for plant ecology is the 
simple linear homoskedastic regression with group specific intercept a q e 

Table 6 

Estimates of 9 q e for some classical distributions. Notation is defined in Section A.l. n q i 

stands for 1/ T iq T jt ■ W q £ is the diagonal matrix with diagonal term Ti q Tj£. # 
param. is the number of independent parameters in the case on directed graph, except for 
the bivariate Gaussian only defined for a nonoriented graph 
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but constant regression coefficient b. It is particularly useful when con- 
trolling for the effect of geography, which is assumed to be the same for 
all groups of plants. We then set 

Xij\i G q, j € £~N(a qi + byij,a 2 ). 

The model can again be extended to Poisson or Bernoulli valued edges 
using adequate link function. 

A. 2. Parameter estimates for other distributions. Table 6 gives the pa- 
rameter estimates for the model listed in Section A.l. The estimates of the 
mean parameter for Gaussian {n q i) distributions are the same as the es- 
timate of the probability ix q i in the Bernoulli case. The results displayed 
in this table are all straightforward. Note that all estimates are weighted 
versions of the intuitive ones. 

A. 3. Parameter estimates for the fungus interaction network. 

Table 7 

Top: Size, mean number of interactions (X ) for each group found with PM. Bottom: 
Parameter estimates for the fungus network: X q i = mean number of shared host species, 
a q — group proportion (%) with PM (no covariate). * stand for X q e — lower than 5e-3 
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F5 


F6 


F7 


F8 


F9 


Size 


2 


3 


5 


6 


7 


19 


24 


26 


62 


A 


1.68 


1.95 


1.65 


0.59 


0.85 


0.57 


0.50 


0.20 


0.12 


Xqi 




















Fl 


5.87 


7.43 


7.64 


2.24 


3.26 


2.88 


1.70 


0.96 


0.47 


F2 


7.43 


9.88 


7.29 


3.59 


4.45 


1.54 


2.77 


1.03 


0.71 


F3 


7.64 


7.29 


12.1 


4.18 


1.54 


3.59 


0.31 


1.47 


0.09 


F4 


2.24 


3.59 


4.18 


2.92 


0.50 


0.47 


0.05 


0.81 


0.03 


F5 


3.26 


4.45 


1.54 


0.50 


2.66 


0.41 


1.91 


0.17 


0.38 


F6 


2.88 


1.54 


3.59 


0.47 


0.41 


2.35 


* 


0.32 


* 


FT 


1.70 


2.77 


0.31 


0.05 


1.91 


* 


1.61 


0.01 


0.18 


F8 


0.96 


1.03 


1.47 


0.81 


0.17 


0.38 


0.01 


0.25 


* 


F9 


0.47 


0.71 


0.09 


0.03 


0.38 




0.18 


* 


0.13 


a q 


1.3 


2.0 


3.3 


3.9 


4.6 


12 


16 


17 


40 
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Table 8 

Fungus interaction network. Effect of different factors on the similarity of host ranges 
between fungal species. AICL is the gain (in log-likelihood units) obtained when switching 
from the best PM model to the best PRMH model for a given covariate 



Factor 


Covariate 


Nb. of groups (PM) 


Nb. of groups (PRMH) 


AICL 


Phylogenetic 


Taxonomic distance 


9 


9 


NA 


relatedness 










Nutritional 


Trivial distance 


9 


9 


NA 


strategy 











SUPPLEMENTARY MATERIAL 

Interaction network between tree and fungal species 

(DOI: 10.1214/07-AOAS361SUPP; .csv). This file contains: 

• The adjacency matrix of interactions between tree and fungal species. 

• The list of the tree species. 

• The list of the fungal species. 

• The matrix of genetic distances between tree species. 

• The matrix of geographical distances between tree species. 

• The matrix of taxonomic distances between fungal species. 

• The matrix of nutritional type of the fungal species. 
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